Fish & Richardson p.c. 




A 



V TA EX P RESS M ATT 



225 Franklin Stret 
Boston, Massachui 
02110-2804 



November 9, 2000 



Telephone 
617 542-5070 



Facsimile 



Box Patent Application 

Commissioner for Patents 
Washington, DC 20231 



Attorney Docket No.: US-4190-CIP/1 0454-0 14002 




Presented for filing is a new continuation-in-part patent application of: 

Applicant: ALFONSO VALDES, KETIH SKINNER AND PHILLIP ANDREW 
PORRAS 

Title: SENSOR AND ALERT CORRELATION IN INTRUSION 
DETECTION SYSTEMS 

Enclosed are the following papers, including those required to receive a filing date 
under 37 CFR§L53(b): 



Enclosures: 

— Postcard, 

— Appendix (19 pages) 

This application is a continuation-in-part (and claims the benefit of priority under 
35 use 120) of U.S. application serial no. 09/653,066, filed September 1, 2000. The 
disclosure of the prior application is considered part of (and is incorporated by 
reference in) the disclosure of this application. 



Pages 



Specification 
Claims 



10 

3 

1 



Abstract 



Declaration 
Drawing(s) 



2 [Unsigned] 
4 



CERTIFICATE OF MAILING BY EXPRESS MAIL 



Express Mail Label No. EL 624 275 981 US 



I hereby certify under 37 CFR §L10 that this correspondence is being 
deposited with the United States Postal Service as Express Mail Post 
Office to Addressee with sufficient postage on the date indicated below 
and is addressed to the Commissioner for Patents, Washington, 




Typed or Printed Name of Person Signing Certificate 



Fish & Richardson p.c. 

Commissioner for Patents 
November 9, 2000 
Page 2 



This application is entitled to small entity status. A small entity statement will be 
filed at a later date. 

Basic filing fee $355 

Total claims in excess of 20 times $9 $0 

Independent claims in excess of 3 times $40 $ 1 20 

Fee for multiple dependent claims $0 

Total filing fee: $475 

A check for the filing fee is enclosed. Please apply any other required fees or any 
credits to deposit account 06-1050, referencing the attorney docket number shown 
above. 

If this application is found to be incomplete, or if a telephone conference would 
otherwise be helpful, please call the undersigned at (617) 542-5070. 

Kindly acknowledge receipt of this application by returning the enclosed postcard. 

Please send all correspondence to: 

DAVID L. FEIGENBAUM 
Fish & Richardson P.C. 
225 Franklin Street 
Boston, MA 02110-2804 

RespectfiiHy submitted, ^ 




Kenneth F. Kozik 
Reg. No. 36,572 



Enclosures 
KFK/dmm 



20154936,doc 



APPLICATION FOR 
UNITED STATES PATENT 

in the name of 

Alfonso Valdes, Ketih Skinner and 
Pliillip Andrew Porras 

of 

SRI International, Inc. 
for 

Sensor and Alert Correlation in Intrusion Detection 

Systems 



Kenneth F. Kozik 

Fish & Richardson P.C. 
225 Franklin Street 
Boston, MA 02110-2804 
Tel.: (617)542-5070 
Fax: (617)542-8906 



ATTORNEY DOCKET: 



BOS-10454-KFK1 10900-1 



DATE OF DEPOSIT: November 9, 2000 

EXPRESS MAIL NO.: EL 624 275 981 v's 



Attorney Docket No. US-4190-CIP/10454-014002 

Sensor and Alert Correlation in Intrusion Detection Systems 

Reference to Government Funding 

This invention was made with Government support under contract number 
F30602-99-C-0149 awarded by the Air Force Research Laboratory. The Government has 
certain rights in this invention. 

Related Application 

This is a continuation-in-part of U.S. patent apphcation serial number 09/653,066, 
filed September 1, 2000 and entitled "Methods for Detecting and Diagnosing 
Abnormalities Using Real-Time Bayes Networks " which is incorporated herein by 
reference. 

Reference to Related Documents 

This invention incorporates by reference a paper entitled "Probabihstic 
Approaches to Alert Management," by A. Valdes, attached as an appendix. 

Background 

The invention relates generally to intrusion detection, and more specifically to 
sensor and alert correlation in mtrusion detection systems. 

There are three main types of intrusion detection systems currently used to detect 
hacker attacks on computer networks: signature analysis systems, statistical analysis 
systems, and systems based on probabilistic reasoning. Signature analysis systems 
compare current data traffic pattems with stored traffic patterns representing the signature 
or profile of various types of hacker attacks. These systems generate an alert if the pattem 
of traffic received by the network matches one of the stored attack pattems. 

Statistical analysis systems compare current data traffic pattems with statistical 
profiles of previous traffic pattems. These systems generate an alert if a current traffic 
pattem is significantly different from a stored profile of "normal" traffic. Examples of 
both statistical- and signature-based mtrusion detection systems are described in Porras, 
et al., ''Live Traffic Analysis of TCP/IP Gateways^ Internet Society's Networks and 
Distributed Systems Society Symposium, March 1998. 
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Examples of intrasion detection systems based on probabilistic reasoning are 
described in U.S. patent application serial number 09/653,066 entitled "Methods for 
Detecting and Diagnosing Abnormalities Using Real-Time Bayes Networks." In one 
such system, a Bayes network is established that includes models (called "hypotheses") 
5 that represent both normal traffic and attack traffic received by a computer network. 
Traffic actually received by the network is examined in real-time to identify its relevant 
characteristics or features, such as volume of data transfer, number of erroneous 
connection requests, nature of erroneous connection requests, ports to which connections 
are attempted, etc. Information about these relevant features is then provided to the 

1 0 Bayes network, which calculates a system beUef (a probability) that the current network 
traffic is either normal traffic or attack traffic. 

A typical intrusion detection system may include one or more sensors that 
monitor network traffic in the manner discussed above, and one or more other sensors 
that monitor network resources. A system operator or network administrator (usually a 

1 5 person) reviews all of the alerts generated by the system. 

A major problem with existing intrusion detection systems is that they often 
provide misleading, incomplete, or low-quahty information to the system operator; this 
may make it impossible to take proper steps to protect the network. For example, during 
a large-scale hacker attack each of the system's sensors may generate hundreds of alerts. 

20 Although each alert may be accurate, the sheer number of alerts could easily overwhelm 
the system operator. Moreover, false alarms can be triggered by normal traffic directed 
towards a malfimctioning network resource, such as a server. These false alarms could 
distract the system operator fi*om alerts triggered by actual hacker attacks. Finally, most 
intrusion detection systems are unable to detect low-level attacks such as port sweeps, in 

25 which hackers slowly "probe" a network to discover its structure and weaknesses. 



Summary 

This invention uses probabilistic correlation techniques to increase sensitivity, 
reduce false alarms, and improve alert report quahty in intrusion detection systems. In 
30 one preferred embodiment, an intrusion detection system includes at least two sensors to 
monitor different aspects of a computer network, such as a sensor that monitors network 
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traffic and a sensor that discovers and monitors available network resources. The sensors 
are correlated in that the behef state of one sensor is used to update or modify the beUef 
state of another sensor. For example, a network resource sensor may detect that a server 
is malfunctioning. This information is used to modify the behef state of one or more 
5 network traffic sensors so that normal network traffic directed towards the 
malfunctioning server does not trigger a false alarm. 

Li another example, a network resource sensor transmits its knowledge of network 
structure to a network traffic sensor. This information is used to modify the belief state 
of the network traffic sensor so that an attempt to communicate with a non-existent 
10 resource appears to be suspicious. By allowing different sensors to share information, the 
system's sensitivity to low-level attacks can be greatly increased, while at the same time 
greatly reducing the number of false alarms. 
1 3 In another embodiment of this invention, probabilistic correlation techniques are 

^ J used to organize alerts generated by different types of sensors. By comparing features of 

J^'J 15 each new alert with features of previous alerts, and adjusting the comparison by an 
I J expectation that certain feature values will or will not match, the alerts can be grouped in 
: an intelligent manner. The system operator may then be presented with a few groups of 

- related alerts, rather than dozens of individual alerts from multiple sensors. 

20 Brief description of the drawings 

Q Figure 1 shows a preferred intrusion detection system with coupled sensors. 

Figure 2 is a flowchart showing a preferred method for coupHng sensors. 
Figure 3 shows a preferred intrusion detection system with coupled sensors and 
an alert correlation device. 
25 Figure 4 is a flowchart showing a preferred method for grouping alerts into 

classes of related alerts. 
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Detailed Description 

I. Introduction 

In preferred embodiments, intrusion detection systems for computer networks 
5 include sensors that monitor both network traffic and network resources. Correlation 
techniques are used to increase system sensitivity, reduce false alarms, and organize 
alerts into related groups. The sensor and alert correlation techniques described below 
may be used in intrusion detection systems that include any type or mix of sensors. 
However, probabilistic sensors similar to those described in U.S. Patent Apphcation 

1 0 Serial Number 09/653,066 may provide advantages over other types of sensors. 

Bayes networks are preferably used to perform the probabihstic reasoning 
functions described below. As is more fully described in U.S. Patent Application Serial 
Number 09/653,066, Bayes networks include models (called "hypotheses") that represent 
some condition or state; probabihstic reasoning methods are used to determine a belief 

1 5 that a current observation corresponds to one of the stored hypotheses. Before a Bayes 
network performs an analysis, an initial belief (called a "prior probability") regarding the 
various hypotheses is established. During the analysis, a new belief based on actual 
observations is estabhshed, and the prior probabihty is adjusted accordingly, hi the 
context of the embodunents described here, a relevant Bayes hypothesis might be "there 

20 is a stealthy portsweep attack against the computer network." Belief in the portsweep 
attack hypothesis would be strengthened if attempts to connect to several invaUd or 
nonexistent ports are observed. 

II. Intrusion Detection with Correlated Sensors 

25 As was discussed above, intrusion detection systems typically have several 

independent sensors that gather different types of information. In preferred embodiments 
of this invention, the information gathered by the various sensors may be shared to 
improve overall performance of the intrusion detection system. 

Li a preferred intrusion detection system 100 shown in Figure 1, sensors are 

30 coupled so that the belief state of one sensor (such as network resource sensor 103) 
affects the belief state of another sensor (such as network traffic sensor 101). Each 
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sensor may then transmit alerts to a system operator or network administrator 105. These 
sensors need not be probabilistic sensors; for example, a sensor that keeps track of the 
number of servers in a network has wouldn't need to generate a probabiUstic output. 

Figure 2 illustrates a preferred method by which sensors are coupled or correlated. 
5 At step 20 1 , a first (preferably probabilistic) sensor receives all or part of the belief state 
of a second (not necessarily probabilistic) sensor. The beUef state of the second sensor 
may indicate an apparent normal, degraded, or compromised state of a monitored system 
resource, the existence or validity of supported services, or any other relevant behef state 
held by a sensor in an intrusion detection system. At step 203, a prior behef state of the 

10 first sensor is adjusted, the adjustment based at least in part on the second sensor's belief 
state. For example, the prior belief state of the first sensor or may be adjusted so that an 
erroneous transaction with a damaged or compromised network resource does not 
generate an alert. In another example, the prior belief state of the first sensor may be 
adjusted so that an attempted communication with a system server or resource that does 

15 not exist appears to be suspicious. By allowing different sensors to share information, the 
system's sensitivity to low-level attacks can be greatly increased, while at the same time 
greatly reducing the number of false alarms. 

III. Alert Correlation 

20 hi another embodiment of this invention, probabihstic correlation techniques are 

used to organize alerts into classes or groups of related alerts. By comparing features of 
each new alert with features of previous alerts, and adjustuig the comparison by an 
expectation that certain feature values will or will not match, the alerts can be grouped in 
an inteUigent manner. Alerts may be grouped or organized in several different ways, 

25 depending on both the type of an attack and the structures of the intrusion detection 
system and the monitored network. For example, all alerts related to a specific attack 
(such as a denial of service attack) may be grouped together. Or, all alerts related to the 
various stages of a staged attack maybe grouped together to allow the system operator to 
view the progression of a staged attack. 
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A. Alert Correlation Devices 

Figure 3 shows a preferred intrusion detection system 300 that includes a network 
traffic sensor 301 and network resource sensor 303. hitrusion detection system 300 may 
include any number of sensors, and the belief states of the sensors may be correlated in 
the manner discussed above. Alerts generated by sensors 301 and 303 are provided to an 
alert correlation device 305, which uses probabiUstic reasoning (preferably Baysian) 
techniques to organize alerts into classes of related alerts. These alert classes are then 
provided to a system operator or network administrator 307. 

B, Measurement of Similarity 

To group alerts in an intelHgent manner, it is necessary to define and determine 
the degree of similarity between a new alert and an existing alert or alert class. This 
measurement of similarity is preferably calculated by comparing similarity among shared 
features (also called "measures") of the alert and alert class. Examples of features to 
consider when assessing similarity between a new alert and an existing alert class include 
source IP address, destination IP address and port, type of alert, type of attack, etc. The 
nature of an alert may change the expectation of which features should be similar. It is 
therefore important to define a measurement of similarity to take into account which 
features are candidates for matching. 

In the following discussion, X is defined to be the value of a feature or measure in 
a new alert, and Y is the value of that feature or measure in an existing alert class to 
which it is being compared. For features that have a single value, the features of a new 
alert and an existing alert class are defined to be similar if their feature values are equal. 
That is: 



The more general case is one in which a feature of an alert includes a list of 
observed values, and a "hit count" of the number of times each value has been observed. 
For reasons of normalization, the hit count may be converted to a probability. In this 
sense, X and Y are hsts of feature and probability values, possibly of different lengths. A 
probabihty vector describes a pattern over observed categories, where: 
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Px(Q = probability of category C in listX, 
PyiC) = probability of category C in list Y, 
Px = probability vector over categories observed for 
Py = probability vector over categories observed for Y. 
The notation c € X is used to denote that category c occurs in Ust The 
similarity of the two lists is given by: 

"2 

^Px(C)xPy(C) 



Sim(X,Y) = 



■CeZAND CeY 



5 If the two lists are the same length, then this measure gives the square of the 

cosine between the two. This has an intuitive geometric property that is not shared by, 
say, the dot product as is conmionly used in the pattern matching literature. 

If the patterns (1, 0), (0, 1), and (0.5, 0.5) are over the same two features, then: 
5'/m({l,0},{0,l}) = 0 

5'/m({0.5,0.5},{0,l})= 5/m({0.5,0.5},{l,0})= 0.5 
10 In other words, the first two patterns are orthogonal, and the third is halftvay in 

between. 

C. Expectation of Similarity 

As was discussed above, the nature of an alert may change an expectation of 
which features of a new alert and an existing alert class should be similar. For example, a 
1 5 syn flood is a type of attack in which the source IP address is typically forged. In this 
case, similarity in the source IP address would not be considered when assessing the 
overall similarity between a new alert triggered by a syn flood attack and an existing alert 
class. 

In a preferred embodiment, the expectation of similarity is a feature-specific 
20 number between 0 and 1 , with 1 indicating a strong expectation of a match, and 0 

indicating that two otherwise similar alerts are Ukely not similar with respect to a specific 
feature. 

An alert state is preferably represented as a probability vector over likely alert 
states. For each candidate alert state, each feature has a vector of the same length whose 
25 elements are the similarity expectation values (that is, numbers between 0 and 1) given 
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the state. These expectation values may initially be assigned a value of about 0.6 to 
indicate a medium expectation of similarity, except where it is known that the expectation 
of similarity is either lower or higher. For example, in an access from a compromised 
host or a denial of service (DOS) attack, the attacker often spoofs (makes up) the source 
5 IP address. Therefore, the expectation values may initially be assigned a value of about 
0.1, indicating a low expectation that the source ff address will match that of the attacker. 

Based on the alert state, the similarity expectation may be dynamically generated 
as the weighted sum of the elements of an expectation table, with the weights from the 
(evolving) alert state distribution. This is itself an expectation in the statistical sense, 
1 0 conditioned on the belief over the present alert state. The similarity expectation is 

therefore general enough to cover the situation of an unambiguous alert or "call" from a 
signature-based sensor, in which case the distribution over states is 1.0 at the state 
corresponding to the alert or call and 0 elsewhere. For example, algebraically for a 
feature or measure J: 

Ej^ Y,BELi})Eji}) 

ie{alert ^states} 

15 Ej = Expected similarity for measure J, given present state. 

BEL(i) =BeUef that the attack state is presently i, 

Ej(i)= Element i of the lookup table of expected similarity for measure / given state /. 

Both the new alert and the alert class to which it is being compared each compute 
the similarity expectation for each feature. Feature similarity and similarity expectation 
are then combined to form a single value of alert similarity. This is done by combining 
20 feature similarities and normalizing by similarity expectation, over the set of common 
features. For example: 

^E}EySIM{Xj,Yj) 



W(X,Y)= ^^^^y 

J 

X =New alert 
Y =Existing alert 
J indexes measures 



XfY) 

Ej^ ^ - Similarity expectation for measure /, alert X(Y). 
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A similar definition can be formed by taking products rather than sums above, and 
then taking the geometric mean. This has some advantages, but can be overly influenced 
by a single extremely small value. 

Where appropriate, similarity expectation may also cover list containment and 
5 subnet similarity. For example, an intrusion detection system may generate two alerts, 
one from a network sensor and one from a host sensor. To decide if the target addresses 
of these two alerts are similar, the expectation of similarity might be that the target 
address in the alert generated by the host sensor would be contained in the Ust of target 
addresses in the network sensor's alert. In another example, an intermediate expectation 
10 of similarity may be generated if target addresses from two alerts do not match but appear 
to be from the same subnet. 

D. Transition Models 

When a new alert is encountered, all existing alert classes are preferably passed 
through their transition models to generate new prior beUef states. Transition models 

15 attempt to assign a "time value" to alert confidence, as well as try to anticipate the next 
step in a staged attack. The "time value" is typically modeled as a multiplicative decay 
factor that reduces alert model confidence slowly over a period of days. The decay period 
varies by type of attack, and eventually decreases to some background level. This is used 
to downweight very old alert classes when there are multiple candidates to be compared 

20 with a new alert. Transition in time tends to decrease overall suspicion for a specific alert 
class, as well as spread suspicion over other hypotheses. That is, it tends to make the 
decision whether to group a new alert with an existing alert class less confident. 

The ability to anticipate the next stage in a staged attack is also achieved by a 
transition in state. Intemally, a transition matrix is maintained that expresses the 

25 probability of the next state in a staged attack, given the present state. Persistence (the 
fact that the next step may match the current) is explicitly captured. The transition gives 
a prior distribution over attack states that is assumed to hold immediately before the next 
alert is analyzed. 

After the transition models generate new prior belief states for their respective 
30 alert classes, the similarity expectation for each feature in a new alert is updated. The 
similarity of the new alert to all existing alert classes is then computed. If none match 
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above some similarity threshold, the new alert is defined to be a new alert class. 
Otherwise, it is assumed that the new alert is a continuation of the best-matching alert 
class. By including a list of contributing alerts in the alert class structure, full details of 
the new alert are retained. 
5 E. Alert Correlation Method 

Figure 4 is a flowchart showing how alerts may be aggregated into classes of 
related alerts in a preferred embodiment. As was discussed above, both alerts and alert 
classes each have one or more distinguishing features that can be assigned different 
values. First, a set of potentially similar features shared by a new alert and one or more 

10 existing alert classes is identified (step 401). Next, an expectation of similarity between 
the features of the new alert and features of one or more existing alert classes is either 
generated or updated (step 403). After a comparison between the new alert and the 
existing alert class(es) is complete (step 405), the new alert is either associated with the 
existing alert class that it most closely matches (step 407), or new alert class is defined to 

15 include the new alert (step 409). 

Although preferred embodiments have been discussed above, the scope of this 
invention is defined by the following claims: 
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What is claimed is: 

1 . A method for correlating a first sensor to a second sensor in an intrusion detection 
system, the first and second sensors each maintaining belief over a number of possible 
states of the system, the method comprising the steps of: 

5 (a) transmitting to the first sensor irrformation about the second sensor's belief 

state; and 

(b) adjusting a prior beUef state of the first sensor, the adjustment based at least in 
part on the second sensor's belief state. 

2. The method of claim 1 wherein the first and second sensors are different types of 
10 sensors. 

3. The method of claim 2 wherein the first sensor is a probabiUstic sensor. 

4. A method for reducing false alarms generated by an intrusion detection system when 
a monitored resource is degraded or compromised, the intrusion detection system 
having a first and second sensors each maintaining belief over a number of possible 

1 5 states of the system, the method comprising the steps of: 

(a) transmitting to the first sensor all or part of the behef of the second sensor 
regarding an apparent normal, degraded or compromised state of a monitored 
resource; and 

(b) adjusting a prior belief state of the first sensor so that an erroneous transaction 
20 with the degraded or compromised resource does not generate an alarm. 

5. A method for enhancing the sensitivity of an intrusion detection system that monitors 
a plurality of computer system resources, the intrusion detection system having a first 
and second sensors each maintaining belief over a number of possible states of the 
system, the method comprising the steps of: 
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(a) transmitting to the first sensor all or part of the belief of the second sensor 
regarding the existence or validity of services supported on monitored 
computer system resources; and 

(b) adjusting a prior belief state of the first sensor so that an attempted 

5 communication with a nonexistent system service or resource appears 

suspicious. 

6. A method for organizing alerts into alert classes, both the alerts and alert classes 
having a plurality of features, each feature having one or more values, the method 
comprising the steps of: 

10 

(a) identifying a set of potentially similar features shared by a new alert and one 
or more existing alert classes; 

(b) comparing the new alert to one or more existing alert classes; 

15 

(c) adjusting the comparison by an expectation that certain feature values will or 
will not match, and either: 

(dl) associating the new alert with the existing alert class that the new alert most 
20 closely matches; or 

(d2) defining a new alert class that is associated with the new alert. 

7. A method for organizing alerts into alert classes, both the alerts and alert classes 
having a plurality of features, each feature having one or more values, the method 
comprising the steps of: 

25 

(a) receiving a new alert; 

(b) identifying a set of potentially similar features shared by the new alert and one 
or more existing alert classes; 

30 

(c) updating a similarity expectation for one or more feature values; 

(d) comparing the new alert with one or more alert classes, and either: 

35 (el) associating the new alert with the existing alert class that the new alert most 

closely matches; or 

(e2) defining a new alert class that is associated with the new alert. 
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8. The method of claim 7 further comprising the step (al) of passing each existing alert 
class through a transition model to generate a new prior belief state for each alert 
class. 

9. A method for organizing alerts having a plurahty of features, each feature having one 
5 or more values, the method comprising the steps of: 



(a) generating a group of feature records for a new alert, each feature record 
including a Ust of observed values for its corresponding feature; 

10 (b) identifying a set of potentially similar features shared by the new alert and one 

or more existing alert classes that are associated with previous alerts; 

(c) comparing the new alert to one or more alert classes; 

15 (d) adjusting the comparison by an expectation that certain feature values will or 

will not match, and either: 

(el) associating the new alert with the existing alert class that the new alert most 
closely matches; or 

20 (e2) defining a new alert class that is associated with the new alert. 
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ABSTRACT 

This invention uses probabilistic correlation techniques to increase sensitivity, reduce 
false alarms, and improve alert report quality in intrusion detection systems. In one preferred 
embodiment, an intrusion detection system includes at least two sensors to monitor different 
aspects of a computer network, such as a sensor that monitors network traffic and a sensor 
that discovers and monitors available network resources. The sensors are correlated in that 
the belief state of one sensor is used to update or modify the belief state of another sensor. In 
another embodiment of this invention, probabiUstic correlation techniques are used to 
organize alerts generated by different sensors in an intrusion detection system. By 
comparing features of each new alert with features of previous alerts, and adjusting the 
comparison by an expectation that certain feature values will or will not match, the alerts can 
be grouped in an intelligent manner, 
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Abstract 

WiUa U7C growmg diplnymcat of ljos( ^nd network intrusiot^ detection systems, managing 
reports from these systems becomes criucally imponauu AJcrt reports coniviin a variecy 
of attributcK describing, foi example, the sensor's confidence in Ihe arrack, (he nature of 
the attack, and lUc assets affecced. Generally, not all aUdbuies arc specified on any given 
jilert report, and management zipproaches should make optimal use of Uic awibutes 
available. We present isvo components ch^ji emjiLoy probnbiliiilic approaches lo address 
two key problems irv ^Icit management: lalen eorrelation and prioritization. For 
con\*liiiion. we exiend ideas from multisensor data fusioiu For >iured aitnbutw between 
Alerts thm air. csiididi^ies for fusion, we coasjder appropriate smxiUrlty measures adjusted 
(or situation-specific expectation of similarity. Puoritizaticn is addressed in a Bayes 
tnodel tluHt producciT an alert ranking in agreement mih a human expert. By a5;suming 
Clu^ dependencies between attributes are limited to attribute groupings, Uie required 
model can be specified compactly. Our model comprehends diffcrx:ntia[ w^ightmg of 
attributes in ^) alen, incorporiiuon of preference profiies, optimal u$e of obs^.rved 
attributes, update capability, and extensibility. Wr, present die base modej us wall .lij & 
viirUm chat edn be iiiteractivdy tmtned by a human expert We *iUo present resciics froin 
An tnidai deployment of these components. 

Keywords! M^.rwork security, disiributed system security, aicrt managcuicnt, dert 
pnontization, adaptive systems 




Introduction 

In response 10 aracks ami poccnual AUack?i ;igiiinsr ea(c(pd5e nctv»^ofk&, admiRif;tr;itory: are 
increasingly d^^ploying tnirusion detccuon .systems ()DSs), I'besc sysceme monitor 
networks, criticiU files, and so F<iith. usmj a variety of signature and probabilistic 
techniques (L 2]» The use of svch :;yh-t<5m«; has given rise to einorher difdcuky, namely, 
the ifitdligeiu management of a poientially large number of alerts from hctct-ogeneous 
ijttnsors. T^c hnpomnt aspects of alcn tnanagcmcm arc alert ootrelsi^ton and alert 
priority ranking. To (ho degree ilm dicse aspects have been jirfdrcsscd in current systems, 
Ueuristic Kchnique^ iwt been used. Wc describe probabiliifUc approaches to these 
criticai problems. 

Thr. intrusion detection cornnaunity is actively developing standard.^ for the cotucnt of 
itlctt mcss;i^e$ (gj. Systems adhering to the^!e evolving siandards fomard attack 
dtfscripiions and suppatxing dtagnoi»tic data to an alctrc mtinagement interface (AMI), 
which may be icmotciy locuteid lii^d cansoiidaiing rhc reports of nunierous sensors, in 
antjciparkm of these standards, we are evolving systems for aien correlation And ranking. 
The present deveiopmem expands on the corrdstion approach iu and provides results 
of a deployment, and imrodiices component tinit prioririxe:; alercs, The components rely 
on N^catistical similarity measures and probabniscic mfei-encc in the form of Bayes 
nc^vorks. 

Bayes^approaclies und probabitistic formaU5y[ns in general, represant a minority of 
inference methodologie,^ employed to date by intrusion detection $y&CfiiTxs wetl 
evolving systems for correlating and prioritising aierts from such systems, I beoreticaliy, 
a prob<ib(lt5tic system needs to specify the cnUre joint probabjiiry distribution of 
observable attributes and con^jsponding prioriiy ranking. Thh is extremely difficult 
because of 5ht> curse of dimension ajiQ^. Instead, the Buyes approach is to assume thJit 
dependencies between arrribuces are local, 50 a much more computet representation of the 
system s knowledge base (local conditional probability relatioas) ts possible. The 
compaancss of knowledge mpresentauon ^nd the adaptive potential m^ike this approach 
attractive rdaUvc to sionacum systems. 

The rertuinder of thh paper is organized as foliows. We fin?i expand on our earlier work 
m prob^bihstic alert correlation HI We then define ihc p; iority ranking mode, including 
^ descnption of the adaptive ti^imng approach, vvherein the desired result is to duplicate 

ttlfr^ ^ ^^^Ty '''"^^''^ ^ ^^^'"^ ^^^^^ ^^^^^^^ preliminary results from the 
aiens generated by various EMERALD monmrs H 2j. 

Sensor Correlation and Alert Fusion 

lnrr^[^^^^ correlation consists of three key functions, presented in order of 

Th u . f' forth) into , manageable number of aim a^ports. 

Is' r^L??'' EMERALD Thread mechanism. At the next level, a m^y be 

wc f t^":^^!^'^ "^^^^^ ^^^^^ f^^"^ heterooeneous sensors, as 

wen ..5 «uxihitry .nronnat.<.n. us pr.vede an nperator wilh intelligently denved meta 



alens. HsUKJ. illustraie* this hierarchy; ilie cv«.nt counts reflect actual couuls from a 
recent 45-day ran of some component on our own TCP gateway. 




Rgurci !; Sensor Correlation Hierarchy 

Alert Thre«cfs 

EMF.RAT.D sensors ali hav« Oie concept of an alen thread, wWch is compatible wtdi 
evolving swndards for alert content (51. Setisors may update reports on an existing attack 

n?r"J?^''°''' T"""" significant increases in the attack confidence 

«tSrded^^^^^^^^^ '1 ^ T"' ''"^'^ '""^ '^^^ ^« ""'^'^g '=>'^^r facto,.. For 
SStl f ? ' u^'^''^'^ "^""^ '"^y f'^^ ^"aJy^'s console. By assigning «II 

Cotnprehending Multiple Sensors 

iinoD»rvabie fcypMhcsts by condltiond probabilily tclauonsnlps. 



/)'3 



The Bayes-Availabiliiy monitov a<jj.prs lo vatid hosts and servir.r.s provided On A 
protecteU svibitct. and makes available lo the session monicor a confidence Umt a given 
service requi»«(».tl is valid or not- In ihc latter case, tbe session iiionitor increases its jjrioi 
suspicion in llic case of failed requcMs. Witli this aiupling, we ciTccuvdy detect ihc 
stealthy portsweepiS in the Lincoln Laboratory evaluation data Itl While the small 
number of ports Ux thcic stealthy attncks is indicativ*; of a specific probe for wbicli a 
signature can be dermcd. we talcen an approach thai we feel is more general and powerful 
ftg:«insr new probes. The result is fhat we can detect a variety of other pocuweeps (socli 
as nmap and strobe) with no modification of our existing models. Thus, knowledge of 
the state of one sensor achieves a practical and signlficani improvement in the sensitivity 
of another. 

The availability' monitor also informs o«r session monitor when a valitJ service is in n 
degraded state (again, thiR is ;i probabiit,stic call). Based on this, the monitor altera its 
prior expectation of the world, adjusting ihc prior e.<pect4Tion of anomalous values for 
particular features. In a simulated attack eavtronmcnl, we have a large number of tiomial 
clients accessing a network tlvat is under Jitcack. When ths attacker achieves a sticccssfiil 
denial of service (DOi), these clients suddenly appear anomalous. With our sensor 
coupling, wc detftcr the DOS and do not raise alerts for the (now faiJc^) iegiiimaie tranic. 
Hsseptially. the intiirnal rnodeU dynamically come to tolerate values for some fcatuics 
that would normally be indicative of certain attacks. There remains tlie potential w 
generate alerts from these st&sims, bvi the evidence ttiust now come from otijcr features. 
WitnoiiL this capability, a successful DOS aluck might cause hundreds of alerts for the 
orhepvise nonnal trafnc (what we terra "collateral damage"). The targe number of alerts 
would overwhelm the analyst, and potentially bury the one valid alen due to the aiiacUer. 
I hfi approach taken gencrai« one alert for die auack session and one alert that the 
service IS m a degraded state. 

Wc therefore s« that our approach, where one set«or comprehends iho state of another 
^"^^ V (evidenced by the ability to detect stealthy 

?M ^ suppresses false (or spurious) alarms. The use of a Bayes fortnalistn 
makes tins coupling mathematically convenient, but is not a requirement. 

Meta Alerts and the Alert Template 

^.clti.: l\Sp Sl^^^^^ '-"'^ °^^«"g-SC specifics, our template 

management comLIt W 1' T "^'^"'^^^s and aleit 

Held u J in ID p^w '"f ='«^'"'^'y" f'^''' addition to the •>conndence" 

sensors. Wc also indudTrS^" !IT """'"^ "'^ '° «Pof^« «f '^^^ 

example. tl,e pecifTc p^JT;^^^^^^^^ 'T' ""^ "^'^'''^ "^'^^^ 

component LL Jk2 flT.t ^° » downstream 

fields describing the scnSr ty;;:::d;iS:;:t' '"'"^"''^ '■"^'"'^^ 

JrSaSrcl^^^^^^^^^ -ployin, both si.n.t.re and 

^ eany cxpu imcnt> .nd.«i(« tiut UJuersc sensors will be able 



to fill this templECc vvith content chat wtti be more useful to correUtion engines than is 
currcinly available. 

Oor probabilisitc '<^\cn fu*?Ion jqjijroach considers feature commonality, feature simtlaricy, 
ajid situaiion-^pccifk expectation of similarity. We mainuin a tisc of^'meta alerts" to 
^rti possibly composed of several alcrf$. potentially from heierogeneous sensors. For two 
alerts (typiciilly a new aleaand a tneta alert), we begin by idearifying fcacuiw tUcy hovci 
in common. Such features include die source of *e attack, iht target (hosts and poru), 
the typfi of the atracK, tunc mfomiefton, and 50 forth. With each feature, wc have 
similarity function that reiunis a numbci between 0 and L mtii I corresponding to a 
perfect match. Siniiliiriry considers >such a$ 

Hov/ well do two list? overlap (for ex;tmple, list of targeted ports)? 

Is cti<j cb^c^rved vaUie concaiiKd in the other (for exantple, is the target port ofci DOS 
attack one of the ponrs ih«c was ibe target of a recent probe)? 

If two $niirce addressee are difl'erent, are they likely to be from the same subnet? 

Not aJl sensors produce alJ possible tdenrifying feature:?. For example* a host sensor 
providers process \D, while z network senvor does not. Features not common to both 
jsiens ^re not considered for the overall icimitarity macch. 

An imporrant innovation we introduce i$ expectation of similarity. This is *niso between 0 
and I, and expresses our prior expectations thai the fcaiufc should match iT the two jilens 
are n^Uted. considering the specifics af t^ach. I-br example, two probes from the same 
targ<^mt-ht scan xhc same sei of ports on diiffet cnt parts of our stibnct (so expectation of 
mKicfnng target IP address is low). Also, some attacks such as SYK FLOOD spoof the 
source address, so we would allow a nm<^h with an earlier probe of the s^mamrgltvcsi 
If the source docs not match (exp^cration of nutch for source iP is low). 

We then compose ovcraU alert similarity from feature similarity values noanaliEed by 
expftccarlon of suiiilarit;r. The approach is v^iid regardless of the number of fe;^tams ihAt 
overlap ;ind is thus wdj suited for use with multiple hcrerogencoas sensors. The new 

""^'^ f'^ '""^^^ ^^^'^ "^'^^ '^'^ similarity i$ good enough. If 
no txistmg meta alerts are a sufficieutfy good match, the new a!en begins t new ,octa 
aJeit thread Fusion consists of combming features so that the new meta alea is a 
supeueiof the pr^eviou5 meta alert And th« new observadnn. Ustsani merged 
comprehendmg "hit counts'^ to each list element. The merge funcrionalicy includes "trim 

ZllTnZ '7'^''^*^' ^J^'^'jy. Mathenuiical details of our simiiariiy 

functions and expectation of simUarity are given in the i^oMmw^. >rhid the 
roilowino subsections? give more detail on featuie fu.^jon and similarity expectation. 

Feature fusion 

^ramrcr^he R^/r^^ ^'"'^^ °" ^^"^^'^'"^ ^'""'^"^ "cross common 

v«luei ,a fused «re typically so aktt fusion involves list merlin- f-br 



examplc. suppose » prcbc of cenain pons on some rang^ of the protected network 
matches in terms of ports with an existing probe ihai originated from Ihc sarofi attacker 
subnet, bui the urgrt hdsts In tl\e prior alert were to a differeot range of our nctwoiic. The 
attacker address list has the attacker address appended, and the lists of target hosts 
are merged. The pon list matches and is thus unchanged. 

Rather ilian die common practice of lisdng al< feature values ever soen, we maiotsin as 
well an associated hit count. For all list features wc have a corresponding iriro function 
that trims JiSt clctttents with cntremcly low kit counts. 

Two important synthetic features arc iho sensor and thrwid Identifiers of all the 
cottiyoacfttaicrcs, so that the operator is always abi^ to cxatnine in detail ihc alcn.-; that 
coiuribuie to the meta alert repon. There is potential for consolidation in a useful sense, 
but with no loss of information since the componetjt aitr^s are always available for 
exaitiination. 



Situation-Specific Similarity Expectation 

Wc now give some examples of how expectation of similarity depends on the situation, 
that is. ihc features in liw meta alert and the new alert. 

If an alert from a sensor has a thread identifier that matches the list of sensor/thread 
identifiers for some meta al<>.tT, Uie alert is considered a match and fusion is done 
immediately, in other words, the individual sen-sor's determination that an alert is an 
ttpdateof or otiierwlsc related to one of its own alerts overrides other considerations of 
alctt similanty. 

If ch= mcta alen has received neporu; from host sensors on diffej-ent hosts, we do not 
expect the target host featui^^ to match. If at iea« or,e re.pn,r from a network sensor has 
conmbmed to die meca alen and a host sensor alctt is i^cdved, the expectation of 
similanry is that the target address of the larrcr is contained in the target list of tile former. 
To consider whether an exploit can be plausibly considered the next stage of sn attack for 

"^-^ ff^'''^'^- ^J^Pect tlte target of d^e exploit (tlje features host and 
port) to be comamed in the target host and port list of Uie .ncca aim. 

dis in '^^-Ib ^'^'^ '"^'"'''^ '^'^^^ "wrch. but we defer to the atiaclc 

m?MTr,^H •^''.^"P^'^'.'" cle„«»«.atio« cnvi.oamcn*. wo run a variant of 
mscan that probes ccnam sensmve pons, that is. it is of the attack class "portsweep". 

2S 5 ,1 If^"^ gene^hzation capability and has no -mscaa" model, but 
hSra^^f^^trtt^^^^^^^ acceptable notches in the target 

!L"cVuosr tf^'" ' '''"''^'^ ' subsequem exploit potentially simitar if the 

cSZZt:^^^^^^ ^^-'^-^^ '-o'ved. I. case of an 

match of address, simdanty ... perfect, Wc assign high ..imikrity if the subnet 



appears to ma;ch, f n thcs w^y, 3 riicia aicrt nmy potftutially consist nf a list of accacker 
addresses. At ihis point, we consider similarity based on coatatnna?>nt- In addiuoa, i^'an 
attacker canipromlses a hosi wthin cur network, that host is added 10 the list of acucker 
ho&cs for the mcu alcn in quesuou. Rnally, for attack classes where the aitackci 's 
address is likely to be spoofed (for example, the Neptune attack), simUacity cxp<icta(.ion 
with respect to accackcr addrc^s^ is assigned a low value. 

In ihis fashion, it is possible to recognize a staged iiiuck composed of^ for example, a 
probe fyUowcd by an exploit co gain access to an internal m^tchine, and tlien using that 
machine to launch an attack against « more crhical asset 



The Priority Ranking Wodel 



Another impomm asrpftrr nf alea management U the need CO rank alerts, SO thai the 
admlni.s^trator con concentrare on the nnost important alercs. We present a Bayes approach 
CO assign a priority ranking to alerts chat provide data for several key attribittes. Our 
system has the following featurei;: 

Abiihy ro weight the prioricy ranking along several attribrire groupings, such as attack 
rype or criricaiiiy of assas affcacd 

Compact representation of Uie influence of the value of an attribute on the priority 

Incorporation of the admmiscriitGr's preference profile as \o the relative importance of 
observed values (such as armck type) 

Ranking influenced on(y by the attribucefe' specified on a given aleit; in general, an 
alon m;^y nnt report atl possible ^nttributes 

Abitify to update cbc prioritization based on observation of a new ato-ibute 

^ ii^tcn&ibiHty of the mociel to comprel^iK^ ijcrributcs that may be defined in the future 
mth mimml perturbation to the rest of the model 

Compuution^Jy, oqr approach h to design a Bayes classifier whose output is a priority 
value and whose observable evidence consists of the attribute values. The iafiuence of an 
atcnbate on the outpui is exprcs:.cd in terms of conditional probability relations. 

Lnecl^VL '''^^'^^ '^^^"^ ^^^^8" « P"^«fy alen that would 

SS^r^ Uua asstgncd by other dom:un expert., che voIu.^.. of aierc. in . 

incrasion dct.crion domain expertise does nor at present ex.31 
ih.s m.cal rcp.esencar.on as . starting point, and provide an adaptive capability 




that enables the expert to mocify the b/^^icnVs "calP'. TKts ti^quirp.!? dornaui knowledge 
ot\ th^ pan of the cKpw. but wo special knowledge of Bayes sysrcms- 

Our model cotisists of a roui node representing the outptrr prioricy ranking (prescnLly, 
"low", ^medium" and **high"). From Hit rool chere are a number of main branches 
rcprcsentrng atmb«l.e grotipmgs'- These are linked t<^ tlK fOOi node by wtut ^re 
cffecuYciy '*pass dxrough" nodes, wtiosc function lo wetgtii Uic subtroc corresponding 
10 the principal branch. The reader wilJ recall ihat this differential weighting w^s noted 
as cn^ of vhc princtpal desirable Teauircs of our syfitem- tn tile present implemenlation, 
we have two main branches coruicciing to nodes rtpr^eniing ihc influence of che attack 
3tTrihulc:> oil the piiarUy, while the other hraiu-h cciinecu to a node represeining <h^ 
iaflu^^nce of ciitlc^iky. Under ihc first node (node A in f ig^tre ^) are ieaf nodes 
expressing ihc relationship of iiaack'Specific atrributes to the priority. Under Lhe second 
nod«£ (node 13 in the figure) arc teaf nodes rcprescniing criticaiity of assci classes that may 
be touched by an aleit. This rcprescnucion of accributc groupings as major braaches from 
the roor allows for subjective weighing as to the reUlive irnponance of the artriburcs in 
question. The rclauve weij^hring of the major bnuidics i$ achieved by the "ps.v:?: rhrough'' 
action of nodes A and zs discussed ni more detail below 

We have proposed Uus model as a dcinonsiration of concept, namely, rlut iiltributcs can 
be grouped and thac w^ wish to potemUily express the fact that different gmups shouW 
have 4itferenf impacts on the output result. The proposed two atmbme gioup.s, described 
in Lhc next subsections, are only one possible division. We Are not h'miied to two 
Jtribme group.^. nor are we precluded Irom a more complex branching structure within 

!?n.^^!^v T"^?^' ^""^ '^^^^^l seems to prcvidt*. ^ stood tradeoff between 

simpiiciry and efttcacy. 




Attack Class 
Subtree 



Asset Criticaiity 
Subtree 



Figure 1: Bayca Mode! for Alert PriorUiiKlion 
Attack Class Attributes 

At present, auack class atiribuics are expressed as a preference profile over ihc auack 
ciHsscs, Tlu* is K configurable measitrtt of concern preassigned ro each ectack class. Wc 
configure our system so tiiat probes are of low coneem. while attacks such as 
unauthorized access are of greater concern. An aJert (raw or fused) provides a probabiliiy 
mass over Oic attack classes; systems capable of a single call assign all their mass to one 
attack class- This is poteotially moderated based boUi on the sensor's confidence in its 
call as wcl! sis the fusion and prioritization engine's confidence in the sensor. Bjwed on 
this probability mass, we perform the equivalent of an expected value calcitlation and 
return a priority score based on atinbutcs of toe attack class. The <iy&\tm bas a node that 
relates this partial score to tlie ovemli piiorisy assigned at the root node. 

Asset Criticality Attributes 

bgill3U represents the subtiee dial relates asset criticality to alert priority. Our initial 
model identifies five jitwibutes that are potentially critical: user, protocol, service. Hie (to 
include dircaory), and host/subaei. A given alert may not pjoWde values for all these 
attributes. For example, the TCP session monitor m [l\ does noi ^xatninc user or file. 
1 iicjcfore. for any of these attributes, the aletT processor passcs to our model one of three 
values: thft ata-ibute was not observed, the attribute *vas obscived atid not considered 
cntresJ, and the attribute was observed and ctitieal. Critkaiity cf ait asset is based on a 
configurauon file tJut reflects security p<Micy. Oih ojodcl supports dynajtifc change ro 
the security policy. The elements of the respective CPTs reflect 
r^c«r^aW<y - ^ncrity ^ p). Earl, of ihesc mattices represents two values of critiealiiy 

Zx^^Z^^^?!?'^"^^' ^^'^ icnowledge base consists of a set of CPTs 

otS »T ? f PPropriat<= r^Ode on its mam branch. If the attribute is not 
-rn^H,..^ ^« cotraspondmg node is not changed, and thu3 thia 

a^uSruutrM f rr ^"•^ «^ ^^^^ >■« 

0 a syste^orcMfrSi ?u! '-^^-^"^"^{00 are all key desirable features 

system or mis type, and they are rigorously addressed in our formalism. 
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Figure 3: Ass^tCdricalicy Subtree 



Pass-Through Nodes 

Nodi^j A and B arc the roots of distinct subtrees reflecting the itiflueucc of d'ffaent 
groaps of aUfibutes on the desired result* We can cQujWcr these nodes as serving ^ "pass 
through** fuacuon, propagating the subtree result lo ihe rwt. If the CITs rdaang the$e 
major branch nodes to thft root ^ire ideatity macriocs, the evaluation from the leaves under 
the bi-Auch k passed throirgh vvi>hoat akcrarion. Moving mass off the diagonal eifecdvely 
perturbs and dowuwfctghts the corresponding subcree result, expr«$trtg, for exmiple, 
different subjective asaignmcutb of impcpriRnce or confidence in a particular attribute ' 
group. Initially^ these CFTs arc near-idctuU)- imtriecs, with $oine off diagodat ma^:^. 



Adoptive Triafnlng 

The Bayes inference engine used here has ein adaptive capability described in (2.1- Briefly 
.crated, the system behaves as if the CVTs arc based cn effective counts. If a hypothesis 
(priority assignment) ''wins*' (posterior belief above a setcable learning threshold), entries 
h\ the CPT :^T^ Adjusted slightly ia the direction of the observation (which are really the 
iikcllhood nici'^'itgcs at the leaf" nodes). The erfeeci ve coum foi the u^innins i^vpothcsis is 
aged (mukipticd by a decay faccor) Had iticremenied by one for the current observation, 
I he effective couac^ for other hypotheses arc aged. Tlierefore, frequently observed 
hypotheses approach a saturation count, and as the adjusimcm depends on Uic t^ffcctivc 
count, new observations perturb the current CPl gnfy sUghUy. Conversely, a very r;ire!y 
observed hypothesiis adapts more qui^.kly new observations, as its effective count 
decayjj lo a lower value and thus assigns lexs weight to past values. This csn bt thought 
of as hypottlftsis-spccific annealing, 



To utilize i\\it in xl\e pi*c.scnt syiitctu, have an interacuvc facility that midomly 
generates attribute values and prompts the operator for che priority ranking to he assigned 
w Lhfc AUn in que,siion. This becomes ihc Mate va^u^^ for an aaciHary "^hard call*' node, 
whose value forcejj tl^e call for the net a *^hole. The IcamUig faciUty is thco irfvok<;d, 
and the C?T values adstpt accordingly. We believe liiac with rca^^onablc initial vnluos fnr 
ihc ClT, based on out expert judgmeai in xbif^ sirea, this itcrauve adapcation can be 
achieved widi far fewer simulated alerts Uuti would be nsqulred to, for ex^mpJe. train a 
xm\vt\ net from scracch. 



Results 



Live Data 

The following result is from a probe mck detected agttinst our itit^irnai network that 
appurciitly looked for varices ports on different subnets ovcsr several days, liach 
itidividuaJ component of the attack i^^ detected by the Bayes TCP sensor, which docs uol 
know bcTorchand thr^t thin is ci ReasiU ve set of parts, llie atcaok does not cause a denial of 
service. 



portsweep i*OSJO 06r«^0:2e from 193*230.37,2 ports 

laee to 32434 dt= 0, 321 

count i64 max a^e count o.i6 cod* 3 eve 1 wax-err - 
Opn 0 -oip 0 -Qport 0 

30 dest IPs: aaa^bbb.l.l aa^t. fabb.^.i BaL&,bbb.4.i «,a.a .bbb.S - 1 
aaa.t)Dft.6a a^3i-ljbb.7a aaa,bbb.s.i *a«.hhb.9.i aaa*bbb.l0.i eiaaX 
-bbta.ii.i a^^.bbh.X2.l AaA,bbb,n,x aaa,bbb,14a aaa.bbba^^I . 
eaa.bbb.iea aea,bbb.l7a aaa,bbb,iB,i a.a^,bbb.I9a aaa.bbb.20a S 
rt^^KKui^*^ aa*.hbb.22.i .^:^a.bbb.23a &aa,bbb,24a s.aa.bbb.25a 
^aa.tbb^Sia ^^^'^^^'^^'^ Aaa.bbb.26a e^.a.bbb.29a a^a,bbb.3tia 

6 ^^t p^rts: C^5(30> U0(29> 143<25} 53(27) 21(27> 10S<S2> 
BEX, O.OQO 0-000 0,000 O.OCO C,6oO o'oco 

™ ^ ^-^54 0.004 O.D02 O.DOO 
0.000 ^^^^^ ^-^^^ ^'^^^ ^-^^^^ 

SVC oist o,4n 0.000 oa94 o.ooo o,3e& o.ooo 

?^vIl^H K^.^r ?n"^ ^ P-^'t^^no«. C.990S2B cede ^nom 0.587018 
Xnv&Iid hoflts 30 Xnvi.Ud ports 6 Nesr^l 24 <9val^count 0 

U,-^TST 2000«OO^iO 0..30,2e ..a.bhhaa to aea,bbD,.i,a port.w.cp 

alcn'fu^^^^^^^ IP addre... appeared over scver.J days. The n.cta 

^icn imion uuhty produceo the following meia alert: 

Mftt«_aiert threa<3 ^2 
source IPs 193.230.37.2 

IL-t^^:;^:! -.T'"-' 



Aa&.bbb.26.1 aaA.bbb.27. Jl aAa.bbb.ae.i A«La.bbb.29 , X aaa*bbb.30,I 
dAfi»bbb.3l.l aaa,bbb.K2 a&a.bbb,3.2 aaa,bbb,4-2 &B4.bbb,5.2 

b-i:,2 a^a.bbbaZ.a aAa-bbb.13-2 Aaa.bbb»14.2 aaa.bbba5.2 aaa»bDb.i6*2 
aa6*bbba7,2 A.aa.bbb. 16 ,2 aaA.bbb-19.2 aaA,bbb-20.2 ^*La , biab . 2 U 2 

Tfom 2OOO-06-iO 06;50;2S to 2000-08-14 01:47:18 



Sum Hits 70S. 99$ dot product 0.1^^775 

Index 635 prob 0,1704SS 

Xna<*K 110 Prob 0.166534 

Iftdesc 14 3 isrob 0,167 341 

Xnd^x S3 Prob 0.X6S9O6 

JndftX 21 Ptrob 0.16995S 

Index loy yrcb 0.157$0C 



Threads J4S60 S564 6980 7650 8223 8767 sze? 9778 1D350 X0$«< 11577 

Uiee nc^4 13035 13427 nsn^ 14X€6 14525 14887 15258 15^77 1^015 X6357 
iC6e9 17040 17321 17SS4 17907 18238 IGSSS IfiS^i 3.5185 19(507 199«U 

This sequence of attacks was confumc^ ^$ y visually ccnain <jCLack by our system 
adinfaiwatioci staff. TUe list of thrcaij tdenclfiers permits the admiuiscrator to examine 
any of the stages in the attack. Iti this case, each attack sttife cs considered r portsv/eep; 
iffic ittages consisted of different attack classes, cl^<tse ^ould be listed under "^Attack 
steps'* above. ThrS attack wa$ a probt (low priority) but accessed at least oae. critical 
^oi^k 50 Che cvemli assigned priority is 0.45 (on a 5cale of 0 ro I), 

Simulation Environment 

Wc describe the cpc^ratioti of th^ Above component in a Simulated attack environment 
I environment simulates an electronic commcj-ce site, ^nd provides Web and mail 
services over the Interact. The prot<:ctcd aetwork ts behind a firewall, and is 
mstfumemcd with several Uo$U neiwork, and pfx^cocof monitors. Two machines ^r^ 
visible through the tirew^ill, and two atixHiary machines are used for network momton^ 
and al^.n managcinenL The firewall blocks all services other th^n Web and mail. Wc 
nave simulated traffic that appears to come frxim a number of sources, and an attacker 
who executes ccrfAin well-known attacks. 

The attack begins with en mscan probe to the two machines visible through me circyvM. 

! H *^ '"^"^ ^^^^^ ft^^ ^^^^l^ f^^Oin 

portsweep . The target pon lists for these .Mem match, as does the attacker address. 
llZeTl "^^^"Z '"'^^^ '^^^^^^ ^'^P^»<^^ ^he order in which the ^ilerts 

fS^i ^^^^^^^^ f ' ^"'"'^^^ '^"^^^"^^ ^^^^^ ^^^^'-^^ <lo nor expect target 

uToei 'P''' '^^^^^'^^ ^^^g^^^*^^ ^^"^^^ ^1^^ '^^^<-^' 

urgets seen .0 f.r, and we expecl the target .ddre... of ..bseqncnc h05(. alerts to be 



contained in the meta alert's carga host list. Rcgar^JJcss of amvat order, these alerts arc 
fused. We hav^c s<^r a ptt^cttncc profile value of LOW (0.1) for prube atcack^^ but since 
uuo Ci Uicai h^>ct c^n^ critical service were probed, the overall alert is priorUizcd by the 
Bayeb^ prioriti/Ation component at 0.64 on 3 st^alc from 0 ro 

TK^ aiutcker tiext ust^s a CGI exploil to obim the pafisword file front the Web server 
This i$ detected by chc host settson The fusion engine considers tWs « new step in the 
c?;isun^ jiuack because of a match in attacker address; as well as the fact lhat the urget 
host and port are confined ni the target nnd port list of {he existinfL probe, Cridcaliiv 
riscj! to over 0.95 because of ihe attack esxai^tion (pmb© to unauthorized acces:v) and 
Asser criticaHty Ccritlcal hosi, service. Hnd now ftie). 

The next .Stage of liie attack fs a buffer overfk>w to gain acce:ss on the host providtug maii 
service. This h detected by the host sensor. Once again, the attack target is alr^^ady 
inciudocl in rfic itst of x\\&. meia alert, so this aler; is fu^-cd as wcii. h i$ imporcant co nme 
tlut the address of thi^; compromised hosr is added to the of attacker source hosts (thai 
ia. u i& both n cargai; and r^ow a potential soaree for ;iti acuck). 

Although cclnet through the firev^-ali is blocked, the an^ckcr may telnet froui the 
conipromi^cd intcmA{ hascxo the critical host, and he tiov^ does so. Qn lhat host> he uses 
anoiher overflow ^.ttzck lo obtain root access* He is now free ;o change Web couicut^ 
Al! these acrioas are detected, and fused inio one meia aUrt expressing all stages of the 
:Utack. CritiCs^Hiy at this poiat Is extremely hi^h (d$^cntially 1 .0). 

h is^wcrth pointing out that all the sensors iu qucstloii provide response dimctives v^hich, 
il followed, v/oviii iitup chc above an«ck i» progress. 

Summary 

A5 intrusion detection systems are more widely deployed, the problem of alert 
management assumes paran)Oani importance. This is bcc^u^^e a qualified docn^iin expert 
IS not able to examine all alerts produced on r^aUsric &uterpi is^e uc^crks by present 

and at aaiy rale such domain expertise is stxU noi widespread. We have proposed a 
system that addresses two aspects Af* alert ixianagement. namely, ak>i1 correlation and 
pnorituauon. The probabilistic approach of these systems givet? an altentaiive and 
complcmenito the heuristic technique r^ore commonly used in Intrusion detectiou. 
We have adapted and extended notions from the field of multi.^cnsor data fusion for alen 
S^nc extensions are princtp^Uy in the ^U'ea of generalizing feature similarily 

tunc^ons to comprehend obrservablcs i. the intrusion derectior^ domain. The approach 
rinJr ''^ ^""'^ ^""^ ^'^'"'^ ^^^^ SOo6 bur not perfect. A ii^t of 
co^^^^^^ '''' ^^^^ the individual 

"^bu^^S^^ r "^"^^^"^^ ^ ^"^^'^ whoso observables arc the 

bawcc^^^^^^^ ^^P^^^^"'^^ t^^^J ^^o^^<^Wonal probability relations 

Thel^^^^^^^^ ^^-^^ ^^^^^*^>!^to ,roupLd the output 

-^u.ms cafl for mndomty generated alert exemplars. This training faeilicy can b^ 



invoked for the system as a whole or for portions of the sy?tcm corresponding to major 
fittrlfauK groupIag5. represented as Bayc.^ subtrees. 
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Technical Appendix: Mathematical Description of Similarity and 
Slmflarity Expectation 



Similarity 

hi chc foilowiag discussion* X is defined to be ihc value of a feature in a new ai^rt, and Y 
k the value of that feature in an exisdng alen class to which ii U being compared. For 
feaiurcs ihai luvc a single value, (he features of a nc^ ^Itn and ma exJitting aim class are 
(jefincd to be similar if their feniure values are equal. That 

The more general esse is one in which a feature of an alert includes a lis: of observed 
values, and a "hit count" of che number of limes each vaUic has been observed. For 
reasous of nomiailzuition, the hit count may be converted to a probability. In this sense, 
X and Y are liscs of feature and probabili<y values, possibly of different lengths. A 
probability vector describes a pattern over the observed categories, w here 

Px{Q ^ probabiliry of category C in list X 

f^)/(C) " probability of cat^pgory C in list Y 

I'x " probabiHiy vector over categories observed for X 

Pj' « piobsbility vector over categwies observed for Y 

Note that these probability vectors &.r& mor= ficnecal than tiie typical biiiaiy rcptesohiation 
ircquerttly employed ir feature matching. The notation C EX to denote that category C 
occurs m list X. Then we can use s variety of similarity ftmcrions based on normalized 
dot products (or equtvalently, cosine separation of feature vectors) such as 



Sim{X,Y]m 



Tp,.(C)x/V(C) 



b. tin thrtirr?/^^ ^'^-^^ ^^"'^^ <^o^'«^ 

dot product , 'T^''^ geometric property that is not shared by, say, the 

ly. 1 ). and {0.5. 0.5) are understood to be over the same two caregorics. then 
M{1'0K{0.I}) = Q 

J'm({0..5.0.5}, (O.l}) = 5*m({0.5.0.5}, {l.O}) O.S 

In other words, the first Cwu p.ucr„. ««ho,o.ul. an. .uird U halfway ir between. 
aSS'^Ji'r'^^^^^^^^^^ ^^^^'^^"-^ ''^""-^ t^uUisen..or data fusion 

tcnture Mm.lar ,f ,t ,s .ncludcd m chc Usr of the candidate alert class to which 



il 1$ being compiLred. A p^'obe attack tnay search for a vuJnerabk *ctrviee on some bosl, 
af«l chea an attack lo a sp<;cific host expioiu; che vuUicrabiiiry* In tWs case, we consider 
lUc i&rgct: *'dmiUr ' if rhe targcr of the exploit is included in the largei lisi of dte probe, In 
addition, IP addresses have special sitnilsriiy fuacuons to cypress an Inteiincdiatc vr^luc 
Tor 2ddressei> appealing to origin^ie from ihc tizmc subnet. Finally, in ii\e case where ^ 
target tiast i:: compromtsed, jb- address is added to the aitiicker's address list, ^nd ii> a 
candidate for similarity evaluation as an attack originator. 

Expectation of Similarity 

A:i W41S discussed e&rlier, ihc nauire of an alert may chanoc an e?cpeciation of which 
features of a new alert and <tn exiAung alert ciass should be ilimilarc l*or example, n ^yn 
flood h a type of attack in which the source IP address is typically foisted. In ihh case, 
similarity in the source IP addresi* would noc be considered when J^sjiessing the ovciiail 
similarity bewcen a new alert triggered by a yyn flood attack snd an c?^isdng alert class. 

£:tpectatton ofsimiianiy b a feature-specific numbcj- between 0 and L, wkh 1 indicating 
a strong expectarion of a fnarch, aiul 0 indicating that two otherwise iimtlar alerts are 
probably nor similar with nsspcct lo a specific feaCMJ^e. An alen state b rcprcGc^ntcd a£ a 
probability vector over candidate scace^v For each candidate ;dca st^ie, each featurt has a 
vector of the same length whose elements are the simihrity expectation vaities given the 
^^tate. Tliis tabic may be mixhily populated with MEDIUM (about 0-6) values, except 
where It is known otherwise. For example, in an access from a compromised bo$t or in 
The case of a DOS, there is a LOW expectation that the souice I? will match that of the 
arufickcr. 

Based on the alert state, the similarity expectation «uy be dynamically senratedsi Cke 
weighted sum of the elements of an expc^uuon table with the weights frnm the 
(evolving) aicrt state dieiribution. Thi$ 1$ itsetf an expectation in the statistical t^ense, 
^jI"; K ''''''r ^^^^ ^«5i<^f over the present alert $tate. This is general enough lo cover Che 
situation of an tinambIguou:i call from a signature engine, in which ca«^e the distributicn 
nSojfe J *^ ^^^'^ corresponding co the call and 0 elsewhere. Algebraically, for 

<%alen ^stales} 

£•> « expected Similarity for feamrc J, given present state 
BELii) ^belief that Uie aiuck suic is prwcntly i 

£-/{0 = clcm^M i of the lookup table of expected similarity for measure J given state / 
.^mlrilT ^hich ic is being compared each compute the 

S Su °' ""t '""f""^- ^''^^^"''^ ^^'"^"aritv expectation are 



5/w(x,y) ^ — ^~pp 

J indexes features 

£]^^^'' « similarity expectjinon for feature 7. alert X(Y) 

A simihu definuioa con l?c fonni^ by taking proclucis raihcr clian sums above, and then 
taking rhe geomciric mean. This has ^ome advantages, but can be overly influenced by a 
sialic ejcirem^y smalt value* 



Global changes: 





to 






to 






to 


malti5cnsoi 


pon scan 


to^ 


ponswcep 


net 


to 


network 


subnei 


to 


5;ubnetwork 




to 


Intemei 


weh 


to 


Web 



Be consistent tluoughout; portswccp, port $w<>cp, pon. :!ican. ^tcolthy poitswecp> steallli port 
Be consisteat: thre^td identifier or thread lU 

A.t first u<;e, establish IDS as an abbreviation tor iiiQusion detection system and u$ti tiicreafter, 
or spell out ui full thji'oughout the paper. 

U ID i$ UKed for "identifier*'; do not use for **inirusiou detection" (miless as part of IDS). 

Be sure tKat TCP and CPT are not conEised, and define CPT at ftret use. 

Check ''set of OPT' -should it be ''&ct of CPlV*'? Note ihat set tieeds a singuliir verb: "set is" 
Note that CPT needs a singular verb: "liic CPT is^\ but ^'il'jese CPTs are" 

Run spell check* 



